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licensure as a marriage and family therapist (MFT) . Taken together, results 
suggest that even if candidates took the examinations later in the cycle, 
there was no clear indication that information obtained form candidates who 
took the test early in the cycle improved performance. The most important 
implication for small testing programs is that they can enjoy the benefits of 
computer administration without having large item pools and candidate 
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A. OBJECTIVES OF THE INQUIRY 

Many small licensing and certification programs are attracted to computer-based testing, 
rather than computerized adaptive testing, as a medium for test delivery. Computer-based testing 
usually involves fixed forms offered on a computer for a specified period of time. There are many 
advantages of computer technology: better administrative control, randomized ordering of items for 
each candidate, flexible candidate scheduling, and immediate delivery of scores to candidates (Way, 
1998). Despite the efficiencies of computer technology, there is concern about item security, 
partictilarly if a fixed form is available for several months. Thus, the concern underlying item 
exposure is that candidates who have prior knowledge of examination content will achieve higher 
scores than candidates with no prior knowledge. 

Most of the literature regarding item exposure is concentrated on computer adaptive testing. 
Several studies addressed the issue of item exposure in a computerized adaptive testing environment 
in terms of the effect of item exposure on examinee scores and different item selection strategies 
such that as item exposure increases, measurement precision decreases (Hale, Angelis, & 

Thibodeau, 1983; O’Neill, Lunz, &, Thiede, 2000; Pastor, Dodd, & Chang, 2002; Stocking & Lewis, 
1998; Stocking & Swanson, 1998; Way, 1998). These authors propose a number of techniques to 
control item exposure yet maintain measurement precision. There are two main approaches to 
control item ejqiosure in conqjuterized adaptive testing, randomized item selection (e. g., 

Bergstrom, Lunz, & Gershon, 1992; McBride & Martin, 1983) and conditional item selection (e, g. 
Stocking and Lewis, 1998; Sympson & Hetter, 1985). Alternative procedures such as the stratified 
design procedure attempt to control item exposure by stratifying the item selection mechanism 
(Chang & Ying, 1999). 
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Other studies have examined the effect of administering the same performance-based 
examination to different cohorts of students over several days or weeks resulted in no consistent 
change across student cohorts (Colliver, Barrows, Vu, Verhulst, Mast & Travis, 1991; Stillman, 
Haley, Sutnick, Philbin, Smith, O’Donnell, & Pohl, 1991; Rutala, Witzke, Leko, Fulginiti, & 
Taylor, 1991). Some studies indicated that access to test information from students early in the 
cycle did not affect the test scores of students who took the examination later in the cycle (Skakun, 
Cook, & Morrison, 1992; Swartz, Colliver, Cohen, & Barrows, 1995). 

In sum, the findings from the aforementioned studies are mixed. Two studies note that there 
can be differences in testing outcomes that are statistically but not practically significant (O’Neill et 
al., 2000, Stocking, Ward, & Potenza, 1998). Nonetheless, there appear to be a variety of fectors 
that affect test outcomes when items are exposed throughout a computerized testing cycle. There 
are psychometric factors: item structure and test specifications (Stocking & Lewis, 1998), examinee 
ability (O’Neill et al., 2000), test length and item pool size (Pastor, et al., 2002; Reveulta & 

Ponsoda, 1998; Way, 1998), type of examination such as standardized patient examinations 
(Macmillan, De Champlain, & Klass, 1999), and exposure rate and percent of item overlap 
(Stocking, et al., 1998; Way, 1998). Non-psychometric factors could also affect test outcomes such 
as the specific recall effect, or an increase in performance due to recall of specific questions and 
answers (Hale et al., 1983). 

The effects of either psychometric or non-psychometric factors have yet to be studied in 
terms of fixed forms administered in a continuous testing environment. Item exposure in a 
continuous testing environment is of particular significance because items are available over a 
relatively long period of time during an administrative cycle, and 30% of the items may be reused 
as anchor items from one form to the next for a subsequent cycle. The important question to be 
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asked is if the same set of items were available throughout a cycle, would candidate performance 
increase from the beginning of the cycle to the end of the cycle? A related question to be asked is 
what could account for changes in the percentage passing over the course of a cycle? The answers 
to these questions is of obvious interest to small testing programs, whose item banks may be s mal l 
and whose candidate pools are not sufiBcient to use computer- adaptive testing delivery syste ms 
In the present study, the effect of item exposure on two conventional examinations 
administered as computer-based tests was explored. A principal hypothesis was that item exposure 
would have little or no effect on average difficulty of the items over the course of an adminis trative 
cycle. There are several factors that underlie this hypothesis. First, only small changes in the 
percent passing had been observed in previous administrations. Second, the examinations were 
relatively long (175 items each). Third, the content of the items spanned a broad r ang e of topics in 
their respective content specifications, e.g. assessment, therapeutic interventions, legal and ethical 
responsibilities, etc. Fourth, each candidate received a different random order of items. F inal ly^ 
many of the items required the candidates to apply their clinical knowledge and training to a case 
scenario, a format in which it is difficult to memorize specific bits of information. 

The hypothesis was tested by exploring conventional item statistics and Rasch estimates of 
ability and difficulty of four separate groups of candidates who took a licensing examination in a 
continuous testing environment over coinse of a 6-month administration cycle. 

B. SOURCES OF INFORMATION PRESENTED 
Subjects. Subjects were 1,001 candidates for a state license in clinical social worker 
(LCS W) and 1 ,660 candidates for a state license as a marriage and fam ily therapist (MET). 

Examinations. The examinations were administered as licensing examinations for LCSWs 
and MFTs. All examinations were multiple-choice and contained 175 items. The examinations 
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used in the present study were: two forms of an LCSW examination and two forms of an MET 
examination. The amount of overlap of items between forms was 30%. Cut scores for each 
examination were established with a modified Angoff procedure. 

Computer-based testing environment . The examinations were administered as fixed, linear 
forms in a continuous testing environment. There were four 6-month periods of administration: 
November 16, 2000-May 15, 2001; May 16, 2001-November 15, 2001; January 2, 2001-June 29, 
2001, and, June 30, 2001-December 31-2001. Candidates were only allowed to take the 
examination once within each 6-month period a candidate. Candidates who failed the examination 
during a given 6-month period could retake the examination when another form became available in 
the following 6-month period. 

C. METHODS 

Design . For each of the four examinations, data were partitioned in three 2-month periods to 
create a means to analyze changes within an administrative cycle. For example, the November 1, 
2000 - May 15, 2001 admimstration of the LCSW examination, data were divided into three 2- 
month periods (first, second, third). 

D ata analyses . Conventional item analyses were performed on each exa mina tion to 
calculate conventional item statistics for each 2-month period and for the entire 6-month period. 
There were not sufficient numbers of candidates who sat for the examination to meet the criteria for 
two-parameter or three-parameter models (e. g., Hulin, Lissak, & Drasgow, 1983); however, 
estimates could be obtained with a Rasch model (e. g.. Green, Bock, Humphreys, Linn, & Reckase, 
1984; Wright <6; Stone, 1979). In the present study, separate Rasch analyses were performed to 
obtain mean difficulty estimates and mean ability estimates for each 2-month period and for the 
entire 6-month period. 
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D. RESULTS AND CONCLUSIONS 

Tables 1 and 2 summarize the mean test scores, ability estimates, and difficulty estimates for 
both examinations. Overall, the effect of item exposure was shght. For all four examinations, there 
was no systematic increase over the 6-month period in terms of the percent passing or coefficient 
alpha. For the conventional item analyses, the mean test scores and standard deviations of scores 
changed only slightly (range of mean scores: 1.22 to 3.60 points; range of standard deviations: 1.13 
to 2.38 points) during each 2-month period in an administrative cycle. 

For the Rasch analyses where difficulty was centered on zero, there were only slight changes 
in the mean and standard deviations of ability est im ates each 2-month period for a given 
examination (see Figures 1 , 2, 3, 4). When ability was centered on zero, there were only slight 
changes in mean and standard deviations of the difficulty estimates. 

Taken together, the results suggest that even if candidates took the e xamina tion later in the 
cycle, there was no clear indication that information obtained from candidates who took the 
examination early in the cycle improved performance. 

E. EDUCATIONAL IMPORTANCE OF THE STUDY 

While the results of the present study are prehminary, the study provides information on 
how item exposure functions in a reahstic context of hcensure examinations a dmini stered in a 
continuous testing environment. If a fixed form is administered in a continuous testing 
environment, the performance of candidates did not increase consistently over time. The most 
important imphcation for small testing programs is that they can enjoy the benefits of computer 
administration without having large item pools and candidate populations or using computer 
adaptive delivery systems. 
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